https://towardsdatascience.com/an-end-to-end-data-science-project-that-will-boost-your-portfolio-c53cfe16f0e3

id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365
0 2539 Clean & quiet apt home by the park 2787 John Brooklyn Kensington 40.64749 -73.97237 Private room 149 1 9 2018-10-19 0.21 6 365
1 2595 Skylit Midtown Castle 2845 Jennifer Manhattan Midtown 40.75362 -73.98377 Entire home/apt 225 1 45 2019-05-21 0.38 2 355
2 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 Elisabeth Manhattan Harlem 40.80902 -73.94190 Private room 150 3 0 NaN NaN 1 365
3 3831 Cozy Entire Floor of Brownstone 4869 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt 89 1 270 2019-07-05 4.64 1 194
4 5022 Entire Apt: Spacious Studio/Loft by central park 7192 Laura Manhattan East Harlem 40.79851 -73.94399 Entire home/apt 80 10 9 2018-11-19 0.10 1 0
Summarize dataset: 100%
131/131 [00:24<00:00, 4.39it/s, Completed]
Generate report structure: 100%
1/1 [00:05<00:00, 5.45s/it]
Render HTML: 100%
1/1 [00:03<00:00, 3.43s/it]
  • Overview
  • Variables
  • Interactions
  • Correlations
  • Missing values
  • Sample
  • Overview
  • Alerts (26)
  • Reproduction
Number of variables
16
Number of observations
48895
Missing cells
20141
Missing cells (%)
2.6%
Duplicate rows
0
Duplicate rows (%)
0.0%
Total size in memory
23.5 MiB
Average record size in memory
503.0 B
Numeric
10
Categorical
6
name has a high cardinality: 47905 distinct values
host_name has a high cardinality: 11452 distinct values
neighbourhood has a high cardinality: 221 distinct values
last_review has a high cardinality: 1764 distinct values
id is highly correlated with host_id
host_id is highly correlated with id
number_of_reviews is highly correlated with reviews_per_month
reviews_per_month is highly correlated with number_of_reviews
id is highly correlated with host_id
host_id is highly correlated with id
number_of_reviews is highly correlated with reviews_per_month
reviews_per_month is highly correlated with number_of_reviews
number_of_reviews is highly correlated with reviews_per_month
reviews_per_month is highly correlated with number_of_reviews
id is highly correlated with host_id
host_id is highly correlated with id
neighbourhood_group is highly correlated with latitude and 1 other fields
latitude is highly correlated with neighbourhood_group and 1 other fields
longitude is highly correlated with neighbourhood_group and 1 other fields
last_review has 10052 (20.6%) missing values
reviews_per_month has 10052 (20.6%) missing values
minimum_nights is highly skewed (γ1 = 21.82727453)
name is uniformly distributed
id has unique values
number_of_reviews has 10052 (20.6%) zeros
availability_365 has 17533 (35.9%) zeros
Analysis started
2022-02-15 23:14:07.337485
Analysis finished
2022-02-15 23:14:32.173862
Duration
24.84 seconds
Software version
pandas-profiling v3.1.1
Download configuration
config.json
id

id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct
48895
Distinct (%)
100.0%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
19017143.24
Minimum
2539
Maximum
36487245
Zeros
0
Zeros (%)
0.0%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:32.281723 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
2539
5-th percentile
1222382.7
Q1
9471945
median
19677284
Q3
29152178.5
95-th percentile
35259101.2
Maximum
36487245
Range
36484706
Interquartile range (IQR)
19680233.5
Standard deviation
10983108.39
Coefficient of variation (CV)
0.5775372383
Kurtosis
-1.227748342
Mean
19017143.24
Median Absolute Deviation (MAD)
9908242
Skewness
-0.09025737546
Sum
9.298432185 × 1011
Variance
1.206286698 × 1014
Monotonicity
Strictly increasing
2022-02-15T18:14:32.428438 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

2539
1
25583366
1
25551687
1
25552076
1
25554120
1
25568873
1
25571627
1
25572892
1
25580113
1
25580283
1
Other values (48885)
48885
  • Minimum 10 values
  • Maximum 10 values
2539
1
2595
1
3647
1
3831
1
5022
1
5099
1
5121
1
5178
1
5203
1
5238
1
36487245
1
36485609
1
36485431
1
36485057
1
36484665
1
36484363
1
36484087
1
36483152
1
36483010
1
36482809
1
name

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct
47905
Distinct (%)
98.0%
Missing
16
Missing (%)
< 0.1%
Memory size
4.4 MiB
18
17
16
12
11
48805
  • Overview
  • Categories
  • Words
  • Characters
Max length
179
Median length
37
Mean length
36.91114794
Min length
1
Total characters
1804180
Distinct characters
776
Distinct categories
20 ?
Distinct scripts
11 ?
Distinct blocks
17 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
47260 ?
Unique (%)
96.7%
1st row
Clean & quiet apt home by the park
2nd row
Skylit Midtown Castle
3rd row
THE VILLAGE OF HARLEM....NEW YORK !
4th row
Cozy Entire Floor of Brownstone
5th row
Entire Apt: Spacious Studio/Loft by central park

Common Values

Hillside Hotel
18
Home away from home
17
New york Multi-unit building
16
Brooklyn Apartment
12
Private Room
11
Loft Suite @ The Box House Hotel
11
Private room
10
Artsy Private BR in Fort Greene Cumberland
10
Beautiful Brooklyn Brownstone
8
Private room in Brooklyn
8
Other values (47895)
48758
(Missing)
16

Length

2022-02-15T18:14:32.628537 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram of lengths of the category

in
16752
room
10038
8430
bedroom
7601
private
7158
apartment
6695
cozy
4991
apt
4618
brooklyn
4049
studio
3988
Other values (12552)
224301
  • Characters
  • Categories
  • Scripts
  • Blocks
Most occurring characters
251424
e
124635
o
122324
t
105261
a
103586
r
97946
i
94651
n
94611
l
51723
m
49121
Other values (766)
708898
Most occurring categories
Lowercase Letter
1206208
Uppercase Letter
270574
Space Separator
251428
Other Punctuation
33826
Decimal Number
25321
Dash Punctuation
6878
Math Symbol
2738
Other Letter
2547
Close Punctuation
1537
Open Punctuation
1395
Other values (10)
1728
Most frequent character per category
Other Letter
房
82
家
46
中
44
间
41
的
38
拉
37
法
36
盛
36
大
30
约
29
Other values (520)
2128
Lowercase Letter
e
124635
o
122324
t
105261
a
103586
r
97946
i
94651
n
94611
l
51723
m
49121
s
48092
Other values (58)
314258
Other Symbol
★
266
❤
168
☆
105
♥
38
⭐
35
✨
34
❥
25
✿
15
☀
15
✰
14
Other values (50)
164
Uppercase Letter
B
29965
S
26481
C
20989
A
19424
R
17945
P
14623
E
14350
L
14062
M
11930
N
11701
Other values (33)
89104
Other Punctuation
,
9177
!
7855
/
5230
.
4375
&
3182
'
1074
*
1021
:
597
#
555
"
294
Other values (11)
466
Math Symbol
+
1382
|
992
~
271
=
34
>
25
<
20
→
6
⋆
4
√
2
×
1
Decimal Number
1
8661
2
6830
3
2560
5
2164
0
2115
4
1307
6
569
7
450
8
399
9
266
Close Punctuation
)
1480
]
37
}
9
】
8
》
3
Open Punctuation
(
1339
[
36
{
9
【
8
《
3
Dash Punctuation
-
6804
—
47
–
26
―
1
Modifier Letter
゙
21
ー
11
゚
5
Modifier Symbol
^
9
`
4
´
3
Space Separator
251424
 
4
Final Punctuation
’
200
”
38
Nonspacing Mark
️
165
︎
14
Connector Punctuation
_
42
‿
1
Initial Punctuation
“
40
‘
8
Control
185
Currency Symbol
$
94
Other Number
²
9
Most occurring scripts
Latin
1476579
Common
324672
Han
2237
Cyrillic
191
Inherited
179
Katakana
136
Hiragana
70
Hangul
70
Hebrew
31
Georgian
13
Most frequent character per script
Han
房
82
家
46
中
44
间
41
的
38
拉
37
法
36
盛
36
大
30
约
29
Other values (401)
1818
Common
251424
,
9177
1
8661
!
7855
2
6830
-
6804
/
5230
.
4375
&
3182
3
2560
Other values (123)
18574
Latin
e
124635
o
122324
t
105261
a
103586
r
97946
i
94651
n
94611
l
51723
m
49121
s
48092
Other values (68)
584629
Hangul
한
7
웃
3
성
3
리
2
맨
2
건
2
물
2
은
2
작
2
뜻
2
Other values (38)
43
Cyrillic
а
26
о
18
т
17
н
15
е
13
р
11
к
11
м
10
в
9
с
9
Other values (23)
52
Katakana
ン
14
ク
12
リ
10
ハ
9
ッ
9
ア
9
ス
8
ト
7
ウ
6
フ
6
Other values (22)
46
Hiragana
の
16
で
7
か
7
ら
6
お
5
い
4
な
4
に
3
く
2
き
2
Other values (13)
14
Hebrew
ו
5
י
5
ר
4
ב
4
ת
2
ע
2
ה
2
ד
1
ש
1
ל
1
Other values (4)
4
Inherited
️
165
︎
14
Georgian
ღ
13
Devanagari
ॐ
2
Most occurring blocks
ASCII
1799687
CJK
2237
Misc Symbols
500
None
431
Punctuation
423
Dingbats
320
Cyrillic
191
VS
179
Hiragana
70
Hangul
70
Other values (7)
72
Most frequent character per block
ASCII
251424
e
124635
o
122324
t
105261
a
103586
r
97946
i
94651
n
94611
l
51723
m
49121
Other values (86)
704405
Misc Symbols
★
266
☆
105
♥
38
☀
15
♀
11
⚡
8
♦
6
♡
6
♛
6
⚓
6
Other values (12)
33
Punctuation
’
200
•
62
—
47
“
40
”
38
–
26
‘
8
‿
1
―
1
Dingbats
❤
168
✨
34
❥
25
✿
15
✰
14
✴
11
✪
8
✌
6
➡
5
✺
4
Other values (13)
30
VS
️
165
︎
14
CJK
房
82
家
46
中
44
间
41
的
38
拉
37
法
36
盛
36
大
30
约
29
Other values (401)
1818
None
⭐
35
à
28
ó
24
゙
21
é
16
。
15
ン
14
·
13
ク
12
ー
11
Other values (70)
242
Cyrillic
а
26
о
18
т
17
н
15
е
13
р
11
к
11
м
10
в
9
с
9
Other values (23)
52
Hiragana
の
16
で
7
か
7
ら
6
お
5
い
4
な
4
に
3
く
2
き
2
Other values (13)
14
Georgian
ღ
13
Hangul
한
7
웃
3
성
3
리
2
맨
2
건
2
물
2
은
2
작
2
뜻
2
Other values (38)
43
Hebrew
ו
5
י
5
ר
4
ב
4
ת
2
ע
2
ה
2
ד
1
ש
1
ל
1
Other values (4)
4
Misc Technical
⍟
4
⏪
1
⏩
1
⌚
1
Geometric Shapes
▲
4
◈
2
△
2
◔
2
▶
1
Math Operators
⋆
4
√
2
⊹
1
Devanagari
ॐ
2
Letterlike Symbols
™
1
host_id

host_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct
37457
Distinct (%)
76.6%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
67620010.65
Minimum
2438
Maximum
274321313
Zeros
0
Zeros (%)
0.0%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:32.811679 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
2438
5-th percentile
815564.1
Q1
7822033
median
30793816
Q3
107434423
95-th percentile
241764600.2
Maximum
274321313
Range
274318875
Interquartile range (IQR)
99612390
Standard deviation
78610967.03
Coefficient of variation (CV)
1.162539998
Kurtosis
0.1691057556
Mean
67620010.65
Median Absolute Deviation (MAD)
27543913
Skewness
1.206213924
Sum
3.306280421 × 1012
Variance
6.179684138 × 1015
Monotonicity
Not monotonic
2022-02-15T18:14:32.974313 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

219517861
327
107434423
232
30283594
121
137358866
103
16098958
96
12243051
96
61391963
91
22541573
87
200380610
65
7503643
52
Other values (37447)
47625
  • Minimum 10 values
  • Maximum 10 values
2438
1
2571
1
2787
6
2845
2
2868
1
2881
2
3151
1
3211
1
3415
1
3563
1
274321313
1
274311461
1
274307600
1
274298453
1
274273284
1
274225617
1
274195458
1
274188386
1
274103383
1
274079964
1
host_name

host_name
Categorical

HIGH CARDINALITY

Distinct
11452
Distinct (%)
23.4%
Missing
21
Missing (%)
< 0.1%
Memory size
3.0 MiB
417
403
327
294
279
47154
  • Overview
  • Categories
  • Words
  • Characters
Max length
35
Median length
6
Mean length
6.12487212
Min length
1
Total characters
299347
Distinct characters
204
Distinct categories
15 ?
Distinct scripts
7 ?
Distinct blocks
9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
6903 ?
Unique (%)
14.1%
1st row
John
2nd row
Jennifer
3rd row
Elisabeth
4th row
LisaRoxanne
5th row
Laura

Common Values

Michael
417
David
403
Sonder (NYC)
327
John
294
Alex
279
Blueground
232
Sarah
227
Daniel
226
Jessica
205
Maria
204
Other values (11442)
46060

Length

2022-02-15T18:14:33.174277 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram of lengths of the category

1120
and
625
michael
460
david
449
sonder
423
nyc
338
john
337
alex
330
laura
293
maria
244
Other values (10259)
49968
  • Characters
  • Categories
  • Scripts
  • Blocks
Most occurring characters
a
37929
e
28680
i
24284
n
24092
r
17861
l
15327
o
12743
t
9401
s
9147
h
9040
Other values (194)
110843
Most occurring categories
Lowercase Letter
235916
Uppercase Letter
54823
Space Separator
5811
Other Punctuation
1592
Open Punctuation
381
Close Punctuation
379
Dash Punctuation
209
Other Letter
110
Decimal Number
84
Math Symbol
34
Other values (5)
8
Most frequent character per category
Other Letter
明
6
德
5
青
5
美
5
文
4
铀
3
春
3
常
3
辣
2
柏
2
Other values (62)
72
Lowercase Letter
a
37929
e
28680
i
24284
n
24092
r
17861
l
15327
o
12743
t
9401
s
9147
h
9040
Other values (54)
47412
Uppercase Letter
A
6458
J
5458
M
5298
S
4744
C
3737
L
2885
D
2752
K
2618
R
2566
E
2361
Other values (28)
15946
Decimal Number
5
20
7
14
0
14
2
11
4
7
1
7
6
4
3
4
8
2
9
1
Other Punctuation
&
1162
.
309
/
41
,
35
'
25
@
8
"
6
!
4
:
2
Space Separator
5805
 
6
Open Punctuation
(
381
Close Punctuation
)
379
Dash Punctuation
-
209
Math Symbol
+
34
Other Symbol
★
2
Final Punctuation
’
2
Format
​
2
Connector Punctuation
_
1
Currency Symbol
£
1
Most occurring scripts
Latin
290683
Common
8498
Han
91
Cyrillic
56
Hangul
11
Hebrew
5
Hiragana
3
Most frequent character per script
Latin
a
37929
e
28680
i
24284
n
24092
r
17861
l
15327
o
12743
t
9401
s
9147
h
9040
Other values (70)
102179
Han
明
6
德
5
青
5
美
5
文
4
铀
3
春
3
常
3
辣
2
柏
2
Other values (45)
53
Common
5805
&
1162
(
381
)
379
.
309
-
209
/
41
,
35
+
34
'
25
Other values (20)
118
Cyrillic
а
6
е
6
н
6
л
4
А
4
и
4
р
3
й
3
с
3
к
3
Other values (12)
14
Hangul
소
2
정
2
비
1
나
1
빈
1
단
1
진
1
선
1
현
1
Hebrew
י
1
ד
1
נ
1
ל
1
א
1
Hiragana
あ
1
り
1
ゆ
1
Most occurring blocks
ASCII
298922
None
247
CJK
91
Cyrillic
56
Hangul
11
Punctuation
10
Hebrew
5
Hiragana
3
Misc Symbols
2
Most frequent character per block
ASCII
a
37929
e
28680
i
24284
n
24092
r
17861
l
15327
o
12743
t
9401
s
9147
h
9040
Other values (67)
110418
None
é
107
í
24
á
22
ú
19
ë
13
ô
11
ó
9
è
7
ç
5
ı
4
Other values (19)
26
Cyrillic
а
6
е
6
н
6
л
4
А
4
и
4
р
3
й
3
с
3
к
3
Other values (12)
14
CJK
明
6
德
5
青
5
美
5
文
4
铀
3
春
3
常
3
辣
2
柏
2
Other values (45)
53
Punctuation
 
6
’
2
​
2
Misc Symbols
★
2
Hangul
소
2
정
2
비
1
나
1
빈
1
단
1
진
1
선
1
현
1
Hebrew
י
1
ד
1
נ
1
ל
1
א
1
Hiragana
あ
1
り
1
ゆ
1
neighbourhood_group

neighbourhood_group
Categorical

HIGH CORRELATION

Distinct
5
Distinct (%)
< 0.1%
Missing
0
Missing (%)
0.0%
Memory size
3.0 MiB
21661
20104
5666
1091
373
  • Overview
  • Categories
  • Words
  • Characters
Max length
13
Median length
8
Mean length
8.182452193
Min length
5
Total characters
400081
Distinct characters
20
Distinct categories
3 ?
Distinct scripts
2 ?
Distinct blocks
1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
0 ?
Unique (%)
0.0%
1st row
Brooklyn
2nd row
Manhattan
3rd row
Manhattan
4th row
Brooklyn
5th row
Manhattan

Common Values

Manhattan
21661
Brooklyn
20104
Queens
5666
Bronx
1091
Staten Island
373

Length

2022-02-15T18:14:33.313815 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram of lengths of the category

Pie chart

2022-02-15T18:14:33.404055 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
manhattan
21661
brooklyn
20104
queens
5666
bronx
1091
staten
373
island
373
  • Characters
  • Categories
  • Scripts
  • Blocks
Most occurring characters
n
70929
a
65729
t
44068
o
41299
M
21661
h
21661
B
21195
r
21195
l
20477
y
20104
Other values (10)
51763
Most occurring categories
Lowercase Letter
350440
Uppercase Letter
49268
Space Separator
373
Most frequent character per category
Lowercase Letter
n
70929
a
65729
t
44068
o
41299
h
21661
r
21195
l
20477
y
20104
k
20104
e
11705
Other values (4)
13169
Uppercase Letter
M
21661
B
21195
Q
5666
S
373
I
373
Space Separator
373
Most occurring scripts
Latin
399708
Common
373
Most frequent character per script
Latin
n
70929
a
65729
t
44068
o
41299
M
21661
h
21661
B
21195
r
21195
l
20477
y
20104
Other values (9)
51390
Common
373
Most occurring blocks
ASCII
400081
Most frequent character per block
ASCII
n
70929
a
65729
t
44068
o
41299
M
21661
h
21661
B
21195
r
21195
l
20477
y
20104
Other values (10)
51763
neighbourhood

neighbourhood
Categorical

HIGH CARDINALITY

Distinct
221
Distinct (%)
0.5%
Missing
0
Missing (%)
0.0%
Memory size
3.2 MiB
3920
3714
2658
2465
1971
34167
  • Overview
  • Categories
  • Words
  • Characters
Max length
26
Median length
12
Mean length
11.89479497
Min length
4
Total characters
581596
Distinct characters
54
Distinct categories
5 ?
Distinct scripts
2 ?
Distinct blocks
1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
6 ?
Unique (%)
< 0.1%
1st row
Kensington
2nd row
Midtown
3rd row
Harlem
4th row
Clinton Hill
5th row
East Harlem

Common Values

Williamsburg
3920
Bedford-Stuyvesant
3714
Harlem
2658
Bushwick
2465
Upper West Side
1971
Hell's Kitchen
1958
East Village
1853
Upper East Side
1798
Crown Heights
1564
Midtown
1545
Other values (211)
25449

Length

2022-02-15T18:14:33.556623 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram of lengths of the category

east
6592
side
4680
williamsburg
3920
harlem
3775
upper
3769
bedford-stuyvesant
3714
heights
3586
village
3164
west
2759
bushwick
2465
Other values (233)
40681
  • Characters
  • Categories
  • Scripts
  • Blocks
Most occurring characters
e
53470
i
42282
s
39625
t
38587
a
37608
l
34448
r
33667
30210
n
26099
o
24032
Other values (44)
221568
Most occurring categories
Lowercase Letter
461107
Uppercase Letter
83934
Space Separator
30210
Dash Punctuation
4251
Other Punctuation
2094
Most frequent character per category
Lowercase Letter
e
53470
i
42282
s
39625
t
38587
a
37608
l
34448
r
33667
n
26099
o
24032
d
19663
Other values (15)
111626
Uppercase Letter
H
11901
S
11483
B
8374
W
8185
E
7084
C
5327
U
3833
G
3723
F
3281
V
3209
Other values (14)
17534
Other Punctuation
'
1968
.
124
,
2
Space Separator
30210
Dash Punctuation
-
4251
Most occurring scripts
Latin
545041
Common
36555
Most frequent character per script
Latin
e
53470
i
42282
s
39625
t
38587
a
37608
l
34448
r
33667
n
26099
o
24032
d
19663
Other values (39)
195560
Common
30210
-
4251
'
1968
.
124
,
2
Most occurring blocks
ASCII
581596
Most frequent character per block
ASCII
e
53470
i
42282
s
39625
t
38587
a
37608
l
34448
r
33667
30210
n
26099
o
24032
Other values (44)
221568
latitude

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct
19048
Distinct (%)
39.0%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
40.72894888
Minimum
40.49979
Maximum
40.91306
Zeros
0
Zeros (%)
0.0%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:33.697543 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
40.49979
5-th percentile
40.646114
Q1
40.6901
median
40.72307
Q3
40.763115
95-th percentile
40.825643
Maximum
40.91306
Range
0.41327
Interquartile range (IQR)
0.073015
Standard deviation
0.05453007806
Coefficient of variation (CV)
0.001338853065
Kurtosis
0.1488446574
Mean
40.72894888
Median Absolute Deviation (MAD)
0.03642
Skewness
0.2371665585
Sum
1991441.956
Variance
0.002973529413
Monotonicity
Not monotonic
2022-02-15T18:14:33.847258 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

40.71813
18
40.68444
13
40.69414
13
40.68634
13
40.76125
12
40.68537
12
40.71171
12
40.71353
12
40.76189
12
40.68683
11
Other values (19038)
48767
  • Minimum 10 values
  • Maximum 10 values
40.49979
1
40.50641
1
40.50708
1
40.50868
1
40.50873
1
40.50943
1
40.51133
1
40.52211
1
40.52293
1
40.527
1
40.91306
1
40.91234
1
40.91169
1
40.91167
1
40.90804
1
40.90734
1
40.90527
1
40.90484
1
40.90406
1
40.90391
1
longitude

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct
14718
Distinct (%)
30.1%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
-73.95216961
Minimum
-74.24442
Maximum
-73.71299
Zeros
0
Zeros (%)
0.0%
Negative
48895
Negative (%)
100.0%
Memory size
382.1 KiB
2022-02-15T18:14:34.036475 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
-74.24442
5-th percentile
-74.00388
Q1
-73.98307
median
-73.95568
Q3
-73.936275
95-th percentile
-73.865771
Maximum
-73.71299
Range
0.53143
Interquartile range (IQR)
0.046795
Standard deviation
0.04615673611
Coefficient of variation (CV)
-0.0006241430961
Kurtosis
5.021646112
Mean
-73.95216961
Median Absolute Deviation (MAD)
0.02485
Skewness
1.284210209
Sum
-3615891.333
Variance
0.002130444288
Monotonicity
Not monotonic
2022-02-15T18:14:34.205303 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

-73.95677
18
-73.95427
18
-73.95405
17
-73.9506
16
-73.94791
16
-73.95332
16
-73.95136
16
-73.95669
15
-73.95742
15
-73.94537
15
Other values (14708)
48733
  • Minimum 10 values
  • Maximum 10 values
-74.24442
1
-74.24285
1
-74.24084
1
-74.23986
1
-74.23914
1
-74.23803
1
-74.23059
1
-74.21238
1
-74.21017
1
-74.20941
1
-73.71299
1
-73.7169
1
-73.71795
1
-73.71829
1
-73.71928
1
-73.72173
1
-73.72179
1
-73.72247
1
-73.72435
1
-73.72581
1
room_type

room_type
Categorical

Distinct
3
Distinct (%)
< 0.1%
Missing
0
Missing (%)
0.0%
Memory size
3.3 MiB
25409
22326
1160
  • Overview
  • Categories
  • Words
  • Characters
Max length
15
Median length
15
Mean length
13.53526945
Min length
11
Total characters
661807
Distinct characters
17
Distinct categories
4 ?
Distinct scripts
2 ?
Distinct blocks
1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
0 ?
Unique (%)
0.0%
1st row
Private room
2nd row
Entire home/apt
3rd row
Private room
4th row
Entire home/apt
5th row
Entire home/apt

Common Values

Entire home/apt
25409
Private room
22326
Shared room
1160

Length

2022-02-15T18:14:34.380301 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram of lengths of the category

Pie chart

2022-02-15T18:14:34.473395 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
entire
25409
home/apt
25409
room
23486
private
22326
shared
1160
  • Characters
  • Categories
  • Scripts
  • Blocks
Most occurring characters
e
74304
t
73144
o
72381
r
72381
a
48895
48895
m
48895
i
47735
h
26569
p
25409
Other values (7)
123199
Most occurring categories
Lowercase Letter
538608
Space Separator
48895
Uppercase Letter
48895
Other Punctuation
25409
Most frequent character per category
Lowercase Letter
e
74304
t
73144
o
72381
r
72381
a
48895
m
48895
i
47735
h
26569
p
25409
n
25409
Other values (2)
23486
Uppercase Letter
E
25409
P
22326
S
1160
Space Separator
48895
Other Punctuation
/
25409
Most occurring scripts
Latin
587503
Common
74304
Most frequent character per script
Latin
e
74304
t
73144
o
72381
r
72381
a
48895
m
48895
i
47735
h
26569
p
25409
E
25409
Other values (5)
72381
Common
48895
/
25409
Most occurring blocks
ASCII
661807
Most frequent character per block
ASCII
e
74304
t
73144
o
72381
r
72381
a
48895
48895
m
48895
i
47735
h
26569
p
25409
Other values (7)
123199
price

price
Real number (ℝ≥0)

Distinct
674
Distinct (%)
1.4%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
152.7206872
Minimum
0
Maximum
10000
Zeros
11
Zeros (%)
< 0.1%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:34.595930 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
0
5-th percentile
40
Q1
69
median
106
Q3
175
95-th percentile
355
Maximum
10000
Range
10000
Interquartile range (IQR)
106
Standard deviation
240.1541697
Coefficient of variation (CV)
1.572505822
Kurtosis
585.6728789
Mean
152.7206872
Median Absolute Deviation (MAD)
46
Skewness
19.118939
Sum
7467278
Variance
57674.02525
Monotonicity
Not monotonic
2022-02-15T18:14:34.760974 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

100
2051
150
2047
50
1534
60
1458
200
1401
75
1370
80
1272
65
1190
70
1170
120
1130
Other values (664)
34272
  • Minimum 10 values
  • Maximum 10 values
0
11
10
17
11
3
12
4
13
1
15
6
16
6
18
2
19
4
20
33
10000
3
9999
3
8500
1
8000
1
7703
1
7500
2
6800
1
6500
3
6419
1
6000
2
minimum_nights

minimum_nights
Real number (ℝ≥0)

SKEWED

Distinct
109
Distinct (%)
0.2%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
7.029962164
Minimum
1
Maximum
1250
Zeros
0
Zeros (%)
0.0%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:34.913846 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
1
5-th percentile
1
Q1
1
median
3
Q3
5
95-th percentile
30
Maximum
1250
Range
1249
Interquartile range (IQR)
4
Standard deviation
20.51054953
Coefficient of variation (CV)
2.917590316
Kurtosis
854.0716624
Mean
7.029962164
Median Absolute Deviation (MAD)
2
Skewness
21.82727453
Sum
343730
Variance
420.6826422
Monotonicity
Not monotonic
2022-02-15T18:14:35.070046 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

1
12720
2
11696
3
7999
30
3760
4
3303
5
3034
7
2058
6
752
14
562
10
483
Other values (99)
2528
  • Minimum 10 values
  • Maximum 10 values
1
12720
2
11696
3
7999
4
3303
5
3034
6
752
7
2058
8
130
9
80
10
483
1250
1
1000
1
999
3
500
5
480
1
400
1
370
1
366
1
365
29
364
1
number_of_reviews

number_of_reviews
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct
394
Distinct (%)
0.8%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
23.27446569
Minimum
0
Maximum
629
Zeros
10052
Zeros (%)
20.6%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:35.234258 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
0
5-th percentile
0
Q1
1
median
5
Q3
24
95-th percentile
114
Maximum
629
Range
629
Interquartile range (IQR)
23
Standard deviation
44.55058227
Coefficient of variation (CV)
1.91413985
Kurtosis
19.52978807
Mean
23.27446569
Median Absolute Deviation (MAD)
5
Skewness
3.690634572
Sum
1138005
Variance
1984.75438
Monotonicity
Not monotonic
2022-02-15T18:14:35.401074 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

0
10052
1
5244
2
3465
3
2520
4
1994
5
1618
6
1357
7
1179
8
1127
9
964
Other values (384)
19375
  • Minimum 10 values
  • Maximum 10 values
0
10052
1
5244
2
3465
3
2520
4
1994
5
1618
6
1357
7
1179
8
1127
9
964
629
1
607
1
597
1
594
1
576
1
543
1
540
1
510
1
488
1
480
1
last_review

last_review
Categorical

HIGH CARDINALITY
MISSING

Distinct
1764
Distinct (%)
4.5%
Missing
10052
Missing (%)
20.6%
Memory size
2.8 MiB
1413
1359
1341
875
718
33137
  • Overview
  • Categories
  • Words
  • Characters
Max length
10
Median length
10
Mean length
10
Min length
10
Total characters
388430
Distinct characters
11
Distinct categories
2 ?
Distinct scripts
1 ?
Distinct blocks
1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
236 ?
Unique (%)
0.6%
1st row
2018-10-19
2nd row
2019-05-21
3rd row
2019-07-05
4th row
2018-11-19
5th row
2019-06-22

Common Values

2019-06-23
1413
2019-07-01
1359
2019-06-30
1341
2019-06-24
875
2019-07-07
718
2019-07-02
658
2019-06-22
655
2019-06-16
601
2019-07-05
580
2019-07-06
565
Other values (1754)
30078
(Missing)
10052

Length

2022-02-15T18:14:35.543049 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram of lengths of the category

2019-06-23
1413
2019-07-01
1359
2019-06-30
1341
2019-06-24
875
2019-07-07
718
2019-07-02
658
2019-06-22
655
2019-06-16
601
2019-07-05
580
2019-07-06
565
Other values (1754)
30078
  • Characters
  • Categories
  • Scripts
  • Blocks
Most occurring characters
0
92333
-
77686
1
62027
2
58684
9
30106
6
19890
7
12824
8
10838
5
9577
3
8764
Most occurring categories
Decimal Number
310744
Dash Punctuation
77686
Most frequent character per category
Decimal Number
0
92333
1
62027
2
58684
9
30106
6
19890
7
12824
8
10838
5
9577
3
8764
4
5701
Dash Punctuation
-
77686
Most occurring scripts
Common
388430
Most frequent character per script
Common
0
92333
-
77686
1
62027
2
58684
9
30106
6
19890
7
12824
8
10838
5
9577
3
8764
Most occurring blocks
ASCII
388430
Most frequent character per block
ASCII
0
92333
-
77686
1
62027
2
58684
9
30106
6
19890
7
12824
8
10838
5
9577
3
8764
reviews_per_month

reviews_per_month
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct
937
Distinct (%)
2.4%
Missing
10052
Missing (%)
20.6%
Infinite
0
Infinite (%)
0.0%
Mean
1.37322143
Minimum
0.01
Maximum
58.5
Zeros
0
Zeros (%)
0.0%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:35.868969 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
0.01
5-th percentile
0.04
Q1
0.19
median
0.72
Q3
2.02
95-th percentile
4.64
Maximum
58.5
Range
58.49
Interquartile range (IQR)
1.83
Standard deviation
1.680441995
Coefficient of variation (CV)
1.223722525
Kurtosis
42.49346948
Mean
1.37322143
Median Absolute Deviation (MAD)
0.62
Skewness
3.130188536
Sum
53340.04
Variance
2.823885299
Monotonicity
Not monotonic
2022-02-15T18:14:36.003067 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

0.02
919
1.0
893
0.05
893
0.03
804
0.16
667
0.04
655
0.08
596
0.09
593
0.06
579
0.11
539
Other values (927)
31705
(Missing)
10052
  • Minimum 10 values
  • Maximum 10 values
0.01
42
0.02
919
0.03
804
0.04
655
0.05
893
0.06
579
0.07
466
0.08
596
0.09
593
0.1
457
58.5
1
27.95
1
20.94
1
19.75
1
17.82
1
16.81
1
16.22
1
16.03
1
15.78
1
15.32
1
calculated_host_listings_count

calculated_host_listings_count
Real number (ℝ≥0)

Distinct
47
Distinct (%)
0.1%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
7.143982002
Minimum
1
Maximum
327
Zeros
0
Zeros (%)
0.0%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:36.143210 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
1
5-th percentile
1
Q1
1
median
1
Q3
2
95-th percentile
15
Maximum
327
Range
326
Interquartile range (IQR)
1
Standard deviation
32.95251885
Coefficient of variation (CV)
4.612626241
Kurtosis
67.5508883
Mean
7.143982002
Median Absolute Deviation (MAD)
0
Skewness
7.9331739
Sum
349305
Variance
1085.868499
Monotonicity
Not monotonic
2022-02-15T18:14:36.284111 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=47)

1
32303
2
6658
3
2853
4
1440
5
845
6
570
8
416
7
399
327
327
9
234
Other values (37)
2850
  • Minimum 10 values
  • Maximum 10 values
1
32303
2
6658
3
2853
4
1440
5
845
6
570
7
399
8
416
9
234
10
210
327
327
232
232
121
121
103
103
96
192
91
91
87
87
65
65
52
104
50
50
availability_365

availability_365
Real number (ℝ≥0)

ZEROS

Distinct
366
Distinct (%)
0.7%
Missing
0
Missing (%)
0.0%
Infinite
0
Infinite (%)
0.0%
Mean
112.7813273
Minimum
0
Maximum
365
Zeros
17533
Zeros (%)
35.9%
Negative
0
Negative (%)
0.0%
Memory size
382.1 KiB
2022-02-15T18:14:36.421582 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values
Minimum
0
5-th percentile
0
Q1
0
median
45
Q3
227
95-th percentile
359
Maximum
365
Range
365
Interquartile range (IQR)
227
Standard deviation
131.6222889
Coefficient of variation (CV)
1.167057455
Kurtosis
-0.9975340452
Mean
112.7813273
Median Absolute Deviation (MAD)
45
Skewness
0.7634075771
Sum
5514443
Variance
17324.42692
Monotonicity
Not monotonic
2022-02-15T18:14:36.577365 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Histogram with fixed size bins (bins=50)

0
17533
365
1295
364
491
1
408
89
361
5
340
3
306
179
301
90
290
2
270
Other values (356)
27300
  • Minimum 10 values
  • Maximum 10 values
0
17533
1
408
2
270
3
306
4
233
5
340
6
245
7
219
8
233
9
193
365
1295
364
491
363
239
362
166
361
111
360
102
359
135
358
180
357
95
356
78
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
  • availability_365
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:29.131873 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:14.054107 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:15.939839 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:17.571668 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:19.304522 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.059206 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:22.499741 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:24.318744 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:25.916481 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.473453 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:29.299423 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:14.339425 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:16.107384 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:17.710196 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:19.520614 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.232604 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:22.663457 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:24.470373 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:26.078460 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.658925 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:29.446651 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:14.541412 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:16.292392 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:17.873194 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:19.690663 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.379743 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:22.839023 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:24.638412 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:26.227737 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.829869 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:29.632455 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:14.831962 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:16.442415 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:18.071145 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:19.847666 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.531865 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:23.020164 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:24.804201 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:26.388669 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.984654 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:29.787422 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:14.998395 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:16.597408 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:18.269342 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:20.021987 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.670892 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:23.205566 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:24.971411 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:26.559867 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:28.153533 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:29.941048 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:15.155333 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:16.748533 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:18.420639 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:20.179885 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.819861 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:23.355478 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:25.140262 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:26.719346 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:28.327581 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:30.259671 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:15.312011 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:16.894327 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:18.560760 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:20.339750 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:21.977553 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:23.554831 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:25.307451 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:26.867347 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:28.489588 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:30.417166 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:15.459190 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:17.059744 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:18.701158 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:20.520860 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:22.113853 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:23.711846 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:25.447893 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.013397 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:28.634129 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:30.595213 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:15.648319 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:17.211679 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:18.976166 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:20.702705 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:22.243048 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:23.868560 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:25.611174 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.173215 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:28.800131 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • availability_365
  • id
  • host_id
  • latitude
  • longitude
  • price
  • minimum_nights
  • number_of_reviews
  • reviews_per_month
  • calculated_host_listings_count
2022-02-15T18:14:30.739785 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:15.795590 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:17.391534 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:19.128847 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:20.885413 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:22.369779 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:24.146602 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:25.756459 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:27.333566 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
2022-02-15T18:14:28.960709 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/
  • Spearman's ρ
  • Pearson's r
  • Kendall's τ
  • Cramér's V (φc)
  • Phik (φk)
2022-02-15T18:14:36.717309 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-02-15T18:14:36.911625 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-02-15T18:14:37.137840 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-02-15T18:14:37.318244 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-02-15T18:14:37.446585 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
  • Count
  • Matrix
  • Heatmap
  • Dendrogram
2022-02-15T18:14:31.064807 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

A simple visualization of nullity by column.

2022-02-15T18:14:31.494009 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

2022-02-15T18:14:31.840994 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

2022-02-15T18:14:31.999340 image/svg+xml Matplotlib v3.4.3, https://matplotlib.org/

The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

First rows

id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365
0 2539 Clean & quiet apt home by the park 2787 John Brooklyn Kensington 40.64749 -73.97237 Private room 149 1 9 2018-10-19 0.21 6 365
1 2595 Skylit Midtown Castle 2845 Jennifer Manhattan Midtown 40.75362 -73.98377 Entire home/apt 225 1 45 2019-05-21 0.38 2 355
2 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 Elisabeth Manhattan Harlem 40.80902 -73.94190 Private room 150 3 0 NaN NaN 1 365
3 3831 Cozy Entire Floor of Brownstone 4869 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt 89 1 270 2019-07-05 4.64 1 194
4 5022 Entire Apt: Spacious Studio/Loft by central park 7192 Laura Manhattan East Harlem 40.79851 -73.94399 Entire home/apt 80 10 9 2018-11-19 0.10 1 0
5 5099 Large Cozy 1 BR Apartment In Midtown East 7322 Chris Manhattan Murray Hill 40.74767 -73.97500 Entire home/apt 200 3 74 2019-06-22 0.59 1 129
6 5121 BlissArtsSpace! 7356 Garon Brooklyn Bedford-Stuyvesant 40.68688 -73.95596 Private room 60 45 49 2017-10-05 0.40 1 0
7 5178 Large Furnished Room Near B'way 8967 Shunichi Manhattan Hell's Kitchen 40.76489 -73.98493 Private room 79 2 430 2019-06-24 3.47 1 220
8 5203 Cozy Clean Guest Room - Family Apt 7490 MaryEllen Manhattan Upper West Side 40.80178 -73.96723 Private room 79 2 118 2017-07-21 0.99 1 0
9 5238 Cute & Cozy Lower East Side 1 bdrm 7549 Ben Manhattan Chinatown 40.71344 -73.99037 Entire home/apt 150 1 160 2019-06-09 1.33 4 188

Last rows

id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365
48885 36482809 Stunning Bedroom NYC! Walking to Central Park!! 131529729 Kendall Manhattan East Harlem 40.79633 -73.93605 Private room 75 2 0 NaN NaN 2 353
48886 36483010 Comfy 1 Bedroom in Midtown East 274311461 Scott Manhattan Midtown 40.75561 -73.96723 Entire home/apt 200 6 0 NaN NaN 1 176
48887 36483152 Garden Jewel Apartment in Williamsburg New York 208514239 Melki Brooklyn Williamsburg 40.71232 -73.94220 Entire home/apt 170 1 0 NaN NaN 3 365
48888 36484087 Spacious Room w/ Private Rooftop, Central loca... 274321313 Kat Manhattan Hell's Kitchen 40.76392 -73.99183 Private room 125 4 0 NaN NaN 1 31
48889 36484363 QUIT PRIVATE HOUSE 107716952 Michael Queens Jamaica 40.69137 -73.80844 Private room 65 1 0 NaN NaN 2 163
48890 36484665 Charming one bedroom - newly renovated rowhouse 8232441 Sabrina Brooklyn Bedford-Stuyvesant 40.67853 -73.94995 Private room 70 2 0 NaN NaN 2 9
48891 36485057 Affordable room in Bushwick/East Williamsburg 6570630 Marisol Brooklyn Bushwick 40.70184 -73.93317 Private room 40 4 0 NaN NaN 2 36
48892 36485431 Sunny Studio at Historical Neighborhood 23492952 Ilgar & Aysel Manhattan Harlem 40.81475 -73.94867 Entire home/apt 115 10 0 NaN NaN 1 27
48893 36485609 43rd St. Time Square-cozy single bed 30985759 Taz Manhattan Hell's Kitchen 40.75751 -73.99112 Shared room 55 1 0 NaN NaN 6 2
48894 36487245 Trendy duplex in the very heart of Hell's Kitchen 68119814 Christophe Manhattan Hell's Kitchen 40.76404 -73.98933 Private room 90 7 0 NaN NaN 1 23
Report generated with pandas-profiling.
10000

https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/¶

latitude longitude magnitude
0 65.193300 -149.072500 1.70
1 38.791832 -122.780830 2.10
2 38.818001 -122.792168 0.48
3 33.601667 -116.727667 0.78
4 37.378334 -118.520836 3.64
Downloading

    Google
    Map data ©2022
    Terms of Use
    Report a map error
    Map data ©2022

    Google
    Map data ©2022 Google
    Terms of Use
    Report a map error
    Map data ©2022 Google
    lat lon
    poi
    metropolitan_museum_of_art 40.77940 -73.96310
    central_park 40.78222 -73.96527
    museum_of_modern_art 40.76150 -73.97739
    statue_of_liberty 40.68916 -74.04444
    empire_state_building 40.74861 -73.98566
    guggenheim_museum 40.78305 -73.95888
    times_square 40.75700 -73.98600
    brooklyn_bridge 40.70570 -73.99640
    american_museum_of_natural_history 40.78145 -73.97383
    grand_central_terminal 40.75291 -73.97724
    high_line 40.74833 -74.00500

    Iterate through each rental (row) in the AirBnb dataset, and for each one iterate through the POI dataframe and calculate the distance from each rental to the POI.¶

    Google
    Map data ©2022 Google
    Terms of Use
    Report a map error
    Map data ©2022 Google